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Abstract 

Well, tonight I'm tired, I've downloaded a bunch of nice music songs that I like a lot, and it's time to reverse. Having received requests 
about this tutorial, contrary to my attitudes I'll write a small tutorial. I've heard talking over and over of the HyperUnpackMe2, so at end, I 
opened it. I fired my IDA 4.3 -yeah, I don't use the cracked one... tools are, after all, for those that can't do things without... 
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REVERSING A SIMPLE VIRTUAL MACHINE 

1 Retrieving instructions and registers 

Well, tonight I'm tired, I've downloaded a bunch of nice music songs that I like a lot, and it's time to reverse. Having received 
requests about this tutorial, contrary to my attitudes I'll write a small tutorial. 

I've heard talking over and over of the HyperUnpackMe2, so at end, I opened it. I fired my IDA 4.3 -yeah, I don't use the 
cracked one... toolz are, after all, for those that can't do things without... 

So, I opened the crackme. It starts with a lot of ugly anti-IDA tricks, which requires to un-define (U key) the jump/call 
pointers, and then redefine the pointed area as 'code' (C key). It hides the pointer to LoadLibrary and strings like i.e. 
"VirtualAlloc" this way. Ok, funny but not interesting, we want to see the virtual machine. Hoping it is not encrypted, 
otherwise we have to fire Oily and unpack the packer until the VM is in clear... 

So, how do we search a VM in the code, using IDA 4.3? Simple: use the scrollbar and the most ancient of reversing tools: 
Zen. 

What are we looking for, what could be a 'Zen' point? Well, When I browsed aspr 1 .2 dll I found the push sequence followed 
by a ret to be 'Zen' point -indeed it was the to-do list of that packer. And for a VM? Well, a VM is formed by instruction 
emulation, which are usually function or addresses to which a common loop of code jumps to. In this case, we look for 
pointers/functions list. Yes, such lists can be many things. They could be objects, for example, which are stored this way. 
How can we distinguish them from a VM -or, what if the VM is coded with an object in HL? 

The answer is rather simple. Start examining these procedures, and look for recurrent patterns. For example, if they refer to 
the same parameters, and the same parameter seems to contain/be used in a pattern among more than one of these functions, 
you might be in presence of a VM. Personally, I always try to find references to common attack points, as the program 
counter (the EIP equivalent). This might not be always simple -i.e. binded flow VMs like *F are fairly complex (btw you can 
log it with various techniques). 

But let's get back to the crackme. Let's say that scrolling, looking around and following randomly jumps and procs we found 
an interesting list, such the next one: 
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T h d U 1 1 n a i' 
1 lltrnyutrr 




OliAAR? 






jnp loc_104A615 






T h d l-l 1 1 n q r 
i 1 1 tr ny u tr f 


■ U 


OIlAAR? 


3UU 


1 OIlAAFR 


eodp 








Thollnnoi' 
i 1 1 tr ny u tr f 


■ B1 

■ u 


OliAnR? 
U4n udl 
















T h d U 1 1 n o r 
1 lltrnyutrr 


■ fH 

• H 


01lAAR9 
















J 














1 lltrnyutrr 




01iAAR7 


Urr 


1 01iAAR7 

l fcPHHUD f 


dd 


offset 


nFF 1 OIlAAFR 
Urr I B4HDrD 


- n ATA 
9 Url 1 rl 


VRF 
Ann 


T h □ U 1 1 n a i' 
1 lltrnyutrr 




OliAARR 

U4HUDD 






dd 


offset 


nFF 1 Oil A 7 07 
Urr I B4nf Bf 






1 1 1 tr ny LP tr f 


■ U 


Oil A ft RF 
U4H Uur 






dd 


offset 


nFF 1 01iA7n 
urr i U4nf io 






ThoUiinoi' 
i 1 1 tr ny u tr f 


1 01 


nil a An 

U4n uuo 






girl 


nFF^pt 
urr jc l 


nFF 1 OIlAAFF 
Urr l B4HDLr 






1 lltrnyutrr 


1 o-i 


01iAftP7 
B4H our 






rid 


nffcpf 
u i i _> tr i_ 


nFF 1 01iA79R 
Urr I H4nf £D 






1 lltrnyutrr 


■ o-i 


01l A APR 
UH-rlUUD 






dd 


offset 


nFF 1 Oil A 71 F 
urr i B4nf ir 






T h □ U 1 1 n a i' 
1 lltrnyutrr 


1 01 


Oil A ft PF 






riri 


n f f -qpf 
u i i _> tr i_ 


nFF 1 01iA7T7 
urr i B4n i o t 






1 1 1 tr ny LP tr f 


1 01 


OliAAn^l 

U4M UUO 






dd 


offset 


nFF 1 01iA71i^ 
urr i u*m t ho 






1 lltrnyutrr 


■ o-i 


Oil A AR7 

D4H D V f 






riri 


nFF^pt 
urr jc l 


nFF 1 01iA71iF 
urr i uH-ri i *tr 






T h d U 1 1 n a r 
1 lltrnyutrr 


1 0"1 


Oil A ADR 






riri 


nffcpf 
u i i _> tr i_ 


nFF 1 01iA7^R 
Urr I B4HOD 






1 lltrnyutrr 


■ o-i 


Oil A ftRF 
UH-rl UUP 






riri 


nFF-qpt" 


nFF 1 01lA7A7 
urr i B4n i u t 






T h □ U 1 1 n a i' 
1 lltrnyutrr 


1 01 


OlLAAri 

U4HDLO 






dd 


offset 


nFF 1 01iA77T 
urr i H4n f t o 






T h □ l-l 1 1 n a i' 
1 lltrnyutrr 


1 0"1 


Oil A ftF7 






dd 


offset 


nFF 1 01iA77F 
urr i U4n i t r 






T h q U 1 1 n d i' 
i 1 1 tr ny u tr r 


■ 01 


OIlAAFR 
U4n UQD 






riri 


nFF^pt 


nFF 1 01iA7RR 
urr i UH-n rcf d 






T h d U 1 1 n a r 
1 lltrnyutrr 


1 o-i 


OIlAAFF 


nFF 


1 OIlAAFF 


dd 


offset 


nnLr 1 OliA 097 

UIIK I UH-H ki£ t 


- HATA 
j Un 1 n 


XRF 
AnH 


1 lltrnyutrr 


1 0"1 


OIlAAFF 
UH-rl DrO 






dd 


offset 


unit IOIlAO^O 

UIIK 1 U4M uo u 






i 1 1 tr ny u tr r 


1 01 


01lAAF7 






dd 


offset 


link- 10liA0^A 

UIIK 1 B4H BJH 






1 1 1 tr ny LP tr f 


1 01 


OIlAAFR 
U4M uru 


nFF 

Urr 


1 OIlAAFR 
i U4n urn 


dd 


offset 


unLr 1 OliOFRO 

UIIK I U47TU7 


- RATA 


XRF 
a n i_ 


i 1 1 tr ny u tr f 


1 01 


OIlAAFF 
uhm urr 






riri 


nFF^pt 


unLr 1 01iOFF9 

UIIK lUH-Vrti 






T h d U 1 1 n a r 
1 lltrnyutrr 


1 01 


Oil A 7 01 

B4H f HO 






dd 


offset 


unLr 1 OliOFFP 
UIIK I Bfir LU 






1 lltrnyutrr 


1 01 


fill A 7 07 


nFF 
urr 


1 Oil A 7 07 

l B4H f Of 


dd 


offset 


unit 1 OliOFFP 
UIIK I u47rrO 


- n ATA 
j Url 1 rl 


XRF 
Ann 


i 1 1 tr ny u tr r 


1 01 


01l A 7 OR 

UHH i UD 






dd 


offset 


nnlt 1 OliQFFP 
uiik i B4Trru 






1 lltrnyLPtrr 


1 01 


Oil A 7 OF 
B4H f Hr 






riri 


nffcrpr 

urr 3t i_ 


link 10UA00A 

UIIK 1 U"-rrl V UU 






ThpHllflPI* 


: 0 J 


04A713 


off 


1 0UA713 


riri 


nFF^pt 


unLr 1 Oil A 0 On 

UIIK I B4HBBU 


; DATA 


KRE 


T h d U 1 1 n a r 
1 lltrnyutrr 


1 01 


0liA71 7 

B4H fit 






riri 


nffcpf 
u i i _> tr i_ 


nnLr 1 OliA 01 A 
UIIK I UH-rl u I U 






T h a l-l 1 1 n a i' 
1 lltrnyutrr 


1 01 


Oil A 71 R 

B4H f ID 






dd 


offset 


nnLr 1 OliA 09 0 

UIIK I B4H U£ B 






1 lltrnyutrr 


1 01 


0liA71 F 
UHH fir 


nFF 
urr 


1 0liA71 F 
i uhh t i r 


riri 


nf f qpf 


nnlt 10liA0^R 

UIIK 1 B4H B3D 


; DATA 


XRE 


T h d U 1 1 n o r 
1 lltrnyutrr 


1 01 


01iA79T 

B4H f £0 






riri 


nf f qpt 

urr 3t i_ 


nnLr 1 OliA OAli 
UIIK I B4HBD4 






1 lltrnyutrr 


1 01 


01iA797 

B4H f £ f 






dd 


offset 


df 3 


; ,, \t7Uy" 


T h d U 1 1 n a r 
1 lltrnyutrr 


1 01 


01iA7?R 

B4H i £.13 


nFF 
urr 


1 01iA77R 

I B4H f £D 


dd 


offset 


■ ink 10liA0li1 

UIIK I B4HB4 I 


; DATA 


XRE 


i 1 1 tr ny u tr f 


1 01 


01lA77F 

UH-rl t £V 






dd 


offset 


■ ink 10llA0llA 
uiik I U4n U4n 






I 1 1 tr ny u tr r 


1 01 


01iA7T1 
uhh t o o 






dd 


offset 


link 10liA0^li 
UIIK 1 B4HH34 






T h d U 1 1 n o r 
1 lltrnyLPtrr 


1 01 


01iA7T7 

B4H f O f 


nFF 
urr 


1 01iA7T7 

I B4H f O f 


dd 


offset 


unk 1BUA075 


; DATA 


KRE 


TheHyper 


: 01 


04A73B 






dd 


offset 


unk 104AB83 






TheHyper 


:01 


04A73F 






dd 


offset 


unk 10UAB95 






TheHyper 


:01 


0hA7h3 


Off 


1 0hA7h3 


dd 


offset 


unk 104AOA2 


; DATA 


XRE 


TheHyper 


:01 


04A7h7 






dd 


offset 


unk 1OUA0B2 






TheHyper 


:01 


04A7hB 






dd 


offset 


unk 104A0C4 






TheHyper 


:01 


04A7hF 


Off_ 


1 0hA74F 


dd 


offset 


unk 104A0D1 


; DATA 


XRE 



Does not it seem interesting? A long table of pointers. Let's then explore one of those secondary links (the first 
table of links just point to the head of seconds -mmh!) 



TheHyper : 01 04A039 


db 


0 




TheHyper : 01 04A03A unk_104A03A 


db 


31h 


; 1 


TheHyper : 01 0UA03B 


db 


37h 


; 7 


TheHyper : 01 0hA03C 


db 


0E9h 


; u 


TheHyper : 01 04A03D 


db 


2 0h 




TheHyper : 01 0hA03E 


db 


1 




TheHyper : 01 0hA03E 


db 


0 




TheHyper : 01 OUAOhO 


db 


0 





IDA gives us this stuff as data, but after pressing C for marking it as code it becomes... 
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3fl ; 

3A 

3ft loc_1 OUfl 03ft: 

3A xor [edi], esi 

3C jnp loc_10Ufl161 



Interesting, no? An XOR operation followed by a jump. Let's press f C f on all the chunks, to see what's happen: 



TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 
TheHyper 



01 04A1 

01 0UA1 

01 04ft1 

01 04A1 

01 04A1 

01 04A1 
01 04A1 
01 04A1 
01 04A1 
01 0UA1 
01 04A1 
01 04A1 
01 04A1 
01 04A1 
01 04A1 
01 04A1 
01 0UA1 
01 04A1 
01 04A1 
01 0UA1 
01 04A1 
01 04A1 
01 04A1 
01 0UA1 



2B 
2B 
2B 
2B 
2D 
2F 
31 
31 
31 
31 
33 
35 
37 
37 
37 
37 
39 
3C 
3E 
3E 
3E 
3E 
40 
42 



loc_104A12B: ; Df 

mou ecx, esi 

shr dword ptr [edi], cl 

jnp short loc_104A161 



loc 104A131 



loc 104A137: 



Df 



mou ecx, esi 

shl byte ptr [edi], cl 

jnp short loc_104A161 



Df 



mou ecx, esi 

shl word ptr [edi], cl 

jnp short loc_104A161 



loc_104A13E: ; Df 

mou ecx, esi 

shl duord ptr [edi], cl 

jmp short loc_104A161 



These are the first place were I originally pressed f C f . Examine the code. All these snippets jump to the same 
address, which means they have a common epilogue. 

Notice the first instruction: a repeating mov ecx, esi in all the entries! Does it not sound as a pattern to you -maybe 
the same logical parameter is passed in esi? Clearly, it is the shift count used in the next instruction, a shl. They 
also uses the [edi] register as target area of the shl instruction in all the snippets. And all the three code blocks 
present the same structure, changing only the memory reference of the core (the 'acting') instruction: byte ptr, word 
ptr, dword ptr. Does this might be a virtual shl instruction in the three referencing possibilities? Yeah! 

So we have understood that here the source parameter for SHL is passed in esi, the destination clearly in edi, and 
we have a sequence of shl on byte I shl on word/shl on dword. 

We have been lucky, however. VMs are often more complex from the structural point of the instruction set. This 
VM does not implements many of the complexities related to the different kind of register/memory/displacement 
references within the instructions, as it seems to use a fixed source/destination mark for the instruction: esi is a 
generic pointer to source, and edi is a generic pointer to the destination result (as we can see by reversing more, 
generic VM registers are passed to the VM instructions by memory reference -i.e. If the destination of a SHL is the 
generic VM register Rl, edi would contain the pointer to Rl). 
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An usual and pretty standard attack point in VMs are the NOP instruction equivalents. How can you discover 
them? Simple. They do nothing but update the internal status of the VM. So, an instruction that just update a 
register which seems to be used as program counter can be very probably our NOP in such VM. This crackme's 
virtual machine is pretty straightforward, however, so we just attacked it recognizing complex instructions directly. 

Now, it is time to reverse all these instruction blocks and name them. The result will lead to something like this: 



BINARVTABLE 


dd 


offset 


MOU 


DAT 




dd 


offset 


ADD 






dd 


offset 


SUB 






dd 


offset 


XOR 






dd 


offset 


AND 






dd 


offset 


OR 






dd 


offset 


IMUL 






dd 


offset 


IDIU 






dd 


offset 


IDIUREST 






dd 


offset 


ROR 






dd 


offset 


ROL 






dd 


offset 


SHR 






dd 


offset 


SHL 






dd 


offset 


CMP 




XOR 


dd 


offset 


XOR BVTEPTR ; 


DAT 




dd 


offset 


KOR WORDPTR 






dd 


offset 


XOR DMORDPTR 




MOU 


dd 


offset 


MOUBVTEPTR 


DAT 




dd 


offset 


MOUWORDPTR 






dd 


offset 


MOUDWORDPTR 




ADD 


dd 


offset 


ADDBVTEPTR 


DAT 




dd 


offset 


ADDWORDPTR 






dd 


offset 


ADDDWORDPTR 




SUB 


dd 


offset 


SUBBVTEPTR 


DAT 




dd 


offset 


SUB WORDPTR 






dd 


offset 


SUB D WORDPTR 




OR 


dd 


offset 


OR BVTEPTR 


DAT 




dd 


offset 


OR WORDPTR 






dd 


offset 


OR D WORDPTR 




AND 


dd 


offset 


AND BVTEPTR ; 


DAT 




dd 


offset 


and3*ohdptr 






dd 


offset 


AND DWORDPTR 




IMUL 


dd 


offset 


IMUL BVTEPTR ; 


DAT 




dd 


offset 


IMUL WORDPTR 






dd 


offset 


IMUL DWORDPTR 




IDIU 


dd 


offset 


IDIU BVTEPTR ; 


DAT 




dd 


offset 


IDIU WORDPTR 






dd 


offset 


IDIU DWORDPTR 




IDIU_REST 


dd 


offset 


IDIUREST BVTEPTR 



All these instructions are structured exactly (more or less) like the shl one. One interesting point to observe is the 
idiv instruction. As you may notice, it has divided in IDIV and IDIV REST. As you remember, IDIV return also 
the remainder of the division. If you examine how the the 2 virtual opcode are implemented, you'll notice: 
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IDIU DWORDPTR: ; DATA XREF 

xor edx, edx 

mou eax, [edi] 

idiu esi 

mou [edi], eax 

jmp endoFbinaryinst ruction 



IDIU_REST DWORDPTR: ; DATA XREF: Thel 

mou eax, [edi] 

xor edx, edx 

idiu esi 

mou [edi], edx 

jmp short endofbinaryinst ruction 



the idiv return in EDI a different register. This should make you think -why? Simple. One is the result, the other the 
remainder. Being the VM instruction structured to work on binary set (source/destination), the author needed to 
duplicate the work of ternary instructions. 

Notice that, before rebuilding a VM, I usually look to all the instruction set, trying to figure out something 
important we haven't talked yet about. I always look for hints about the VM register's structure. For example, when 
I found the following instructions, I first thought: 



) 

) CMP DWORDPTR: ; DATA XREF: Theh 

) cmp [edi], esi 

J pushf 

i pop dword ptr [eax+GCh] ; set UM Flac 

jmp short $+2 

I 

I endofbinaryinstruction: ; CODE KREF: EXEC 



I ; ENECUTE_UM_INS1 

I leaue 

J retn 8 



"PUSHF'??? Why do he need a PUSHF instruction here? He is saving the flags after a comparison. Mmh... and 
then pops them on a structure related to the EAX register. Is EAX register's used with displacements in other VM 
code snippets? Yes, of course. 

At this point ask yourself: why one should save the flags after a comparison within a relative structure? In case you 
did not understand this yet, the [EAX+OCh] clearly points to the virtual EFLAGS register. So we can open the IDA 
structure page, create a structure and add doublewords until we create the "field_0Ch". Which we'll rename in 
VM EFLAGS or such. 
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CMP DWORDPTR: ; DATA XREF: The 

cmp [edi], esi 
pushf 

pop [eax+UM.EFLAGS] |; set UM Flags 

j up short $+2 



As in the sample above. 

Now we have identified our first VM register! Let's hunt the other, while reversing opcodes. Among instructions, 
we find also the next one: 



loc_104A1FD: ; DATA NREF: Thi 

sub dword ptr [eax+10h], k 

mou edx, [edi] 

and edx, OFFFFh 

mou [esi-4], edx 

j up short end of unary instruction 



When I saw it I noted: it takes a fixed VM register (fixed because the offset from the VM structure base, eax, is 
fixed, lOh) and subtract 4. Take the operand from edi mask out the last 2 bytes and then store them. What asm 
operation do you know that decreases a register when writing? 

C'mon... maybe it is more clear now... 



loc_10iiA20E: ; DATA NREF: The 

sub [eax+UM.ESP] , h 

mou edx, [edi] 

mou [esi-U], edx 

jmp short endoFunaryinstruction 



...I hope I needed not to comment it. This is PUSH DWORD. 

And another VM register is uncovered. Let's go on, we still miss the EIP, the generic registers... Let's find them. 
Browsing the instructions, we can find: 



JZ: ; DATA XREF: Tl 

push dword ptr [eax+GCh] 
popF 

jz short loc_104A32A 

mou [eax+8], edi 

loc_104A32A: ; CODE XREF: Tl 

jmp short endoFFlowinstruction 



© 2006 CodeBreakers Magazine 



Page 8 of 17 



REVERSING A SIMPLE VIRTUAL MACHINE 



Now, this instruction has the same layout of the CMP, but it features a JZ instruction. It is a jump, good. EIP must 
be used here, as we jump somewhere, so we must alter the EIP register somehow. We already know what 
EAX+OCh is, it is our VM EFLAGS. So, here the virtual eflags gets moved in CPU eflags, and JZ is executed. If 
the jump is NOT taken, the edi parameter is moved within eax+8. We know that eax contains our VM context, so 
we can bet that the instruction parameters that gets copied there is... our new EIP after the jump (technically, this 
means that the instruction is JNZ, not JZ\). 

So... 



JZ: ; DATA KREF: Tl 

push [eax+UM. EFLAGS] 
popF 

jz short loc_104A32A 

nou [eax+UM. EIP] , edi 

loc_10iiA32A: ; CODE NREF: Tl 

j up short endoff lowinstruction 

We found the VM EIP register. Now, try yourself to identify the next instruction: 



; DATA NREF: T 

nou edx, [eax+UM. EIP] 

nou ecx, [eax+UM. ESP] 

sub [eax+UM. ESP] f h 

nou [ecx-4], edx 

nou [eax+UM. EIP] f edi 

jnp short end_oF_Flow_instruction 



I won't give you any hint, except that is clearly an instruction that uses ESP and EIP. Think please. 

Another last interesting point. You should always keep in mind that the VM author is not ties to follow an f rule f 
when coding a VM. So, instruction are not needed to be 'standard'. They can do anything their creator wishes. For 
example, one instruction does this: 

nou esi, esp 

nou edx, [eax+UM. ESP] 

nou esp, edx 

call edi 

nou edx, [ebp+UM.EIP] 

nou [edx+38h], eax 

nou esp, esi 

jnp short $+2 



You should notice this: it uses the real ESP register! Why? It saves the real ESP, than take the virtual stack and set 
it as the REAL stack. And call a function via EDX. This means that this virtual machine is capable of making calls 
in real CPU space, by pushing virtual parameters in the virtual stack and then calling this instruction, which swaps 
the stacks (it reminds a bit the stack switching with parameters copying between inter-privilege gates, if you know 
well processors). Also note that the return value of the real-cpu executed function is saved within our VM context, 
somewhere... 
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I reversed almost all the VM set and registers in half an hour, and you can do the same, with little effort. There are 
only a bunch of instructions that are more complex, but they are not important for VM reversing (I mean, for 
understanding the general structure). 

Well, it is time for me to go to sleep, very very late! Hope you appreciated the small tute. 
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2 General VM Structure 



...next time ;-) 

...Well, next time has come, let's fire the mp3 player with f Liga f :-) 

If we examine the general structure of a VM, we usually find a big cycle that takes care of running the VM across 
the virtual assembler, emulating this way the complex stages the processors execute when fetching, decoding and 
executing instructions. The HyperCrackme2 uses this generic VM structure: 

1 . Setup the VM Context. 

2. Enter the VM loop. 

3. Read byte at VM.EIP address and check the instruction type, supporting various instruction types: 

1 . Binary Instructions. 

2 . Unary Instructions . 

3. Flow Control Instructions. 

4. Special Instructions. 

5. Debug Instructions. 

6. NOP and HLT (alias "Quit VM") instruction -the latter ending the VM loop. 

4. Jump at start of the VM Loop. 

This structure is general enough to be kept in mind. From a generic point of view, each VM contains the following 
elements: 

• The initialization block/function of the Virtual Machine 

• A loop block/function that scan and executes the instructions of the VM program. 

• A generic block/function that decodes the VM instruction's opcode, with its parameters, registers, indexing 
modes and anything the VM creator wanted to place on. 

• A list of VM instruction code blocks, which perform each an instruction duty. They are roughly the 
equivalent of the micro-code modern CPU's uses for decomposing and executing common ASM 
instructions. 

• A set of macro-instructions, specific to the VM and not easily mappable to ASM opcodes. These 
instructions might be harder to understand. 
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An example of the HyperCrackme2 initial structure elements can be seen by examining the following commented 
IDA snip: 



ThaUnnai' - CM flJiClA-lEZ 








i netiyper . on uihdi ^ 


DCfTflDT 

Kti I HK I 


MM DDnPCCC - 

_UN_rKULtii . 


nnnr vnrr . DDflPCCC IIMj.DC i -■ 

, LUUt iSrlth . rKULtii_Un + B J 


i netiyper . on um-ioi ^ 


flOQ I 


xor 


ebx , ebx 


i iietiypet . o i u*mu 1 1 




xor 


eux , eux 


i netiyper . on u4hdi y 


nnn 

HZB 


xor 


ecx , ecx 


i netiyper . on uihdi d 




no u 


□ □ u r ahn + IIM PflhlTCYTl 


ThpHnnpr - 01 flliAAIF 
i iitrnuutrr - u i U4nu it. 


028 


P10U 


p^v rpAK+UM FTP1 


TheHyper:01 04A621 


028 


mou 


cl, [eax] ; CHECK FIRST BYTE OP 


TheHyper:B1 840623 


028 


cmp 


cl, ODh ; >0DH IS UNARY ETC . 


TheHyper:01 04A626 


028 




short IS UNARY INSTR 


TheHyper:01 04A628 


028 


push 


[ebp+UM_CONTEXT] 


TheHyper:B1 04A62B 


02C 


call 


Setup Binary Instruction Parans ; set ESI/EDI/ECK value 


TheHyper:01 04A630 


028 


push 


offset BINARY TABLE 


TheHyper:01 04A635 


02C 


push 


[ebp+UM CONTENT] 


TheHyper:01 04A638 


030 


call 


EXECUTE_UM_INSTRUCTI0N2 ; ecx == Instruction Index in 0 


TheHyper:01 04A638 






; esi == 1st operand // edi == 2nd oper 


TheHyper:01 04A63D 


030 


jmp 


short END_OF_FETCHER 



As you can see, the RESTART VM PROCESS is the point (2) of the above description, whereas the part under the 
ja short IS _UNARY INSTR is equivalent to the (3.1) point. The code in this snippet, apart cleansing the registers, 
prefetch the first instruction Opcode (the byte pointed by VM.EIP) and analyse it for choosing which 'execution 
unit 1 of the VM should be utilised for the instruction type being fetched. 

Let's now examine one of the 'building block' of this VM, the Setup _Binary_Instruction_Params function, which 
takes care of processing the binary VM opcodes. For examining the next fragment, remember that EAX contains 
our VM CONTEXT. So, we already know that eax+8 refers to our VM.IEP. 

I think it is important now to understand what we are looking for, or analysis will be useless. We are trying to 
recover the VM Instruction structure, together with a more detailed description of the Virtual Machine structure. 
The procedure that fills up the parameters for the binary instructions must know how to decode the binary 
instructions, so by examining how the bytes that makes an opcode we can rebuild the VM instruction format. What 
should we expect to find? It depends heavily on the complexity of the instruction set, as it depends entirely by the 
author choices. Which we must reverse. So, we must always examine carefully how the instruction's byte are 
utilised, as they can change from instruction type to instruction type. And please remember that VM instruction are 
not compelled to be always of the same size, as x86 instruction's are not all of the same size... 

You won't be able to apply the method used below to other VMs. Each VM uses its own opcode and VM structure, 
so you should try to understand what fragments are used to hint its reconstruction. 



Let's start by examining this code: 

O10U9F1B mou [ebp+uar_2], 0 

01049F1F mou eax, [eax+8] 

01OU9F22 mou bl, [eax+1] 

01049F25 mou dl, bl 
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This snippet should be clear: we load the second byte pointed by our virtual EIP, [eax+1], then we move it on the 
dl register. Before commenting in detail this point, we should keep notice we've just used one of the byes that 
makes an instruction. Let's move over. 



OIOWFUC test dl, h 

OIOWFUF jz short loc_10ii9F71 

010U9F51 nou cl, [eax+2] 

O10ii9F5ii and cl, OFOh 

01049F57 shr cl, h 

01049F5A lea edi, [edi+ecx*4+1 Oh] 

B1049F5E nou [ebp+uar_2], 1 



This snippet is pretty similar (conceptually) to the prior one. EAX still contains our VM.EIP address, and now the 
third byte forming the opcode is loaded in memory and tested (technically, only the high nibble of it is tested, as 
you can notice by the and/ shr pair). And notice the instruction that follows. EDI contains our VM_CONTEXT 
pointer here. So, the ECX register contains a dword index, which is applied to the VMCONTEXT for retrieving a 
dword pointer, which is then offseted by lOh. But do you remember? VM_CONTEXT+10h == VMESP. This 
means that when ECX is Oh here, we got the ESP register addressed. And when it is lh, the dword after it is 
addressed, until the 15 th DWORD after ESP (a nibble ranges 0-15, you'd know...). So, we detected right now a 
possible usage of the third byte of the binary opcodes -at least of its upper nibble. The snippet below is the area 
where we jump if we are successful in the jz instruction used in the code above. 

:01 0U9F71 

:010ii9F71 loc_10ii9F71 : ; 

:01049F71 nou edi, [eax+4] 

:01049F74 add dh, [ebp+uar_1] 



As you can notice, it takes the value that follows the first dword from EAX (which is our VM.EIP) and places it in 
EDI. And we know that EDI will contain at end the destination parameter of VM opcode! This help us 
understanding that the first dword is used only for the opcode purposes, and after it we have opcode parameters. 



This is what we know of our VM CONTEXT right now: 



0008 EIP dd ? 

0O0C EFLAGS dd ? 

0010 ESP dd ? 

0014 REGS dd 15 dup(?) 



Let's continue our analysis of binary opcodes, and try to map the VM INSTRUCTION format. We have already 
encountered the offsets +l,+2 of our VM instruction, so lets examine the last one, the +3: 



01 0U9F62 
01 049F65 
01 0U9F67 
01 0U9F69 
01 049F6D 
01 0U9F6F 



test dl, 2 

jz short loc_1049F77 

nou edi, [edi] 

mousx ecx, byte ptr [eax+3] 

add edi, ecx 

jmp short loc_10U9F77 
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This byte is directly loaded in ecx using MOVSX. You should already understand what I'm about to say: why 
MOVSX7? this byte is then added to the EDI parameter, which contain our destination parameter. Why should we 
need to add something to our parameter? Displacement, of course... 

So, we now can rebuild the instruction's structure for Binary instruction's: 



eeee 


OPCODE 


db 


7 


0001 


MODES 


db 




0002 


REGS 


db 




0003 


DISPLACEMENT 


db 




0004 


DEST 


dd 




0008 


SOURCE 


dd 


7 



I agree I haven't commented much this part. But the reason is that it is very 'VM-dependent'. 
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3 Reversing Guidelines 



The steps shown in prior chapters are an important step toward the comprehension of a VM. 

• You can initially skip the structure of a VM instruction, as long as it is not decrypted/decoded within each 
instruction. 

• At this point, we must examine deeply the instruction set trying to find something recognizable, as the NOP 
instruction -which might not be included at all. 

• Once the instruction set is starting to result clear, at least in minimal part, a special care must be set by 
looking for possible VM register's usage. Eventually their usage won't be clear, as they can be 'shifty', 
remapped upon each VM entry etc. but we don't care. Knowledge is incremental, and making errors is 
human -especially if you abuse of Zen for quickening your analysis by intuition ;-) 

• At this point, we must attack the 'living heart' of the VM, its decoder. It contains all the important 
information's of the VM and the structure of the VM instructions, as it is usually responsible for the 
scheduling and performing the instruction (pre-)processing. You must remember that often the decoder 
have to analyse the VM instruction for discovering things like the opcode length, parameters and so on. But 
it is also possible that part of the management is performed in the instructions itself -i.e. making 
instructions of fixed size (i.e. 16 bytes). 

• And then? Then we must get back to the instruction set, trying to understand specific, non-standard opcodes 
that perform creative duties that are usually not part of a processor (i.e. Calls to 'real' functions, API 
functions, calculation blocks etc. etc.). 

• At this point we have decoded most of the VM, and we might try to debug an instruction or two to se if 
things are as we expected, and if VM registers follows up our scheme. 

• But before or later you have to get coding for dumping the VM Program in comprehensible shape. You 
might wish to write an IDA plugin (if you don't use 4.3 like me) or a script for decoding the VM program. 
Or much simpler but slightly less effective, you can code a logger, which is simply an hook in the VM 
instruction table, for each instruction (simply make your debugger-loader and use breakpoints which you 
defer in the breakpoint event, or inject a dll which hooks the table). Whenever an instruction is called, your 
hook dumps the opcode name, and the parameters. So, you can rebuild the flow of the program. An useful 
add-on to the logger is a VM.EIP dumper, which allows you to assign the right key to each VM instruction, 
and eventually the possibility to 'alter' the result of conditional jumps, so to allow the logger to examine the 
major part of the VM program and eventually 'skip' long cycles. Later, you can reassemble most of the VM 
program it using the VM.EIP logged for each instruction. 



Well, I hope this can help you all to understand VMs better. I saw is common style in tutorials to place credits, so 
my thanks to the Community and my friends Zero and HAVOK. 

Regards, 

Maximus 
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For the curious, this is my IDA analysis of the binary parameter's setup decoder of the crackme: 



r 



push ebp 

nou ebp, esp ; DH == +bytes after UM opcode 

add esp, -4 

nou eax, [ebp+UM_CONTEXT] 

push eax 

push dword ptr [esp] 

nou [ebp+DEST_IS_REGISTER] , 0 

nou eax, [eax+UM.EIP] 

nou bl, [eax+ INSTRUCT I ON. MODES] ; addressing type byte 

nou dl, bl 

nou dh, i* ; nin. operand size, preset. 

nou [ebp+operand_size] , 0 

test dl, 1 

jz short loc_1049F36 

add [ebp+operand size] , 2 

; CODE NREF: Setup_Binary_Instruction_Parans+22t j 

test dl, 2 

jnz short loc_1049f3f 

add [ebp+operand size] , 1 

; CODE NREF: Setup_Binary_Instruction_Parans+2Bt j 

add [ebp+operand size] , 1 

and dl, 11100000b ; last 3 bits of byte are used in other manner* 

shj- dl, 5 

pop edi 

nou esi, edi 

test dl, 100b ; is dest a nenory ref? 

jz short SET_DEST_AS_MEMADDRESS 

nou cl, [eax+INSTRUCTION.REGS] ; no, SET DEST AS UM GENERAL REGISTER 

and cl, OFOh ; hi nibble is reg. index 

shr cl, 4 

lea edi, [edi+ecx**i+1 Oh] ; get pointer to un register , (incliding esp in the count?) 

nou [ebp+DEST IS REGISTER] , 1 

test dl, 10b 

jz short sourceparaneter ; bl is +1 byte of instruction 

nou edi, [edi] ; inplenent the displacenent in the register's access. 

nousx ecx, [eax+INSTRUCTION. DISPLACEMENT] ; load displacenent index of register's, last byte of UM 

; (NOTE: SIGN EXTENDED, to support backward jcc's) 

add edi, ecx ; add the displacenent to the destionation address 

jnp short sourceparaneter ; bl is +1 byte of instruction 
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ADDRESS: ; CODE JiREF : Setup_Binary_Instruction_Params+41 t j 

nog edi, [eax+4] 

add dh, [ebp+operand_size] 

i : ; CODE XREF: Setup_Binary_Instruction_Params+57T j 

; Setup_Binary_Instruction_Params+6lTj 

mou dl, bl ; bl is +1 byte of instruction 

and dl, 111100b 

shr dl, 2 ; note that shared bit isnt used aboue. 

mou cl f [eax+INSTRUCTION-REGS] 

test dl, 100b 

jz short loc_1049F93 

mou cl f [eax+INSTRUCTION-REGS] ; SOURCE is register, 

and cl, OFh ; loner nibble is source 

mou esi, [esi+ecx*4+1 Oh] 

jmp short calcopsize 

; CODE XREF: Setup_Binary_Instruction_Params+77f j 

mou esi, [eax+ INSTRUCT I ON -SOURCE] 

cmp [ebp+DESTISREGISTER] , 1 

jnz short loc_1049F9F 

mou esi, [eax+4] 



; CODE XREF: Setup_Binary_Instruction_Params+8Cf j 
add dh v [ebp+operand_size] 

; CODE XREF: Setup_Binary_Instruction_Params+83T j 

test dl f 2 

jz short updateeip 

test dl, 8 

jz short set_esi_dword 

mousx ecx, [eax+INSTRUCT I ON -DISPLACEMENT] 

add esi, ecx 



mou esi, [esi] 



CODE XREF: Setup_Binary_Instruction_Params+9Ct j 
CODE XREF: Setup_Binary_Instruction_Params+97t j 



mou cl, [eax] 

pop eax 

xor dl, dl 

shr edx, 8 

add [eax+UM-EIP] , edx 

mouzx ebx, [ebp+operandsize] 

shr ebx, 1 
leaue 

retn 4 
structionParams endp 
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